Service migration across cluster boundaries

ABSTRACT

Embodiments provide migration of services across different clusters to balance utilization and meet customer demands. Different service migration options may be performed with or without downtime. The artifacts of the service are moved to a new destination cluster. The service is created on the new destination cluster and staged so that the service is almost ready to start. In one embodiment, the service is stopped on the old cluster and started on the new cluster. After stopping the service, DNS is updated to point to the service on the new cluster. In another embodiment, the service is stopped on the old cluster and started on the new cluster with the same IP address to avoid DNS reprogramming and associated delays. In a further embodiment, the migration is performed without downtime by moving the service part by part from one cluster to another.

BACKGROUND

Large scale data centers typically comprise organized clusters ofhardware machines running collections of standard software packages,such as web servers, database servers, and the like. For fault toleranceand management reasons, the machines in a datacenter are typicallydivided into multiple clusters that are independently monitored andmanaged by a framework that coordinates resources for softwareapplications. In one embodiment, the framework may be a Windows Azure™Fabric Controller, for example, that provisions, supports, monitors, andcommands virtual machines (VMs) and physical servers that make up thedatacenter.

In existing datacenters, each tenant is deployed to a single cluster forits entire lifecycle, which allows the tenants' deployment to be managedby a single framework. This configuration may limit the tenant's growth,however, as expansion is limited to the machines within the singlecluster. The tight coupling between tenants and clusters requiresdatacenter operators to maintain the capacity for a cluster at a levelthat will satisfy the potential future requirements for the tenantsdeployed on that cluster. Often, this results in the clusters operatingat a low current utilization rate in anticipation of possible futureneeds. Even when excess capacity is maintained, this only improves thelikelihood that a tenant's future needs will be supported. There is noguarantee that a tenant scale request will be limited to the reservedcapacity and, therefore, at times a tenant may be unable to obtain therequired capacity.

Limiting a service to one cluster also creates a single point of failurefor that service. If the framework controlling that cluster fails, thenthe entire cluster will fail and all services supported on the clusterwill be unavailable.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

Embodiments of the invention allow a tenant's service to move amongmultiple clusters either with or without downtime. A service isassociated with a particular IP address on a cluster. Users access theservice using a domain name that is translated to the IP address by adomain name system (DNS) or other network location service. Theservice's IP address might or might not be changed when the service ismoved among clusters.

A service may be migrated with downtime by staging a new instance of theservice in a new cluster, waiting for the new instance to be ready, thenstopping the original instance, and pointing the DNS name of the serviceto IP address corresponding to the new deployment of the service on thenew cluster.

Alternatively, the service may be migrated to the new cluster withdowntime and may keep an original IP address. This would avoid the needfor reprogramming DNS and the associated delays while the DNS caches arerepopulated.

A further alternative for migrating the service is to perform themigration without downtime by moving the service part by part such thatthe service is always running in either or both of the clustersthroughout the migration.

DRAWINGS

To further clarify the above and other advantages and features ofembodiments of the present invention, a more particular description ofembodiments of the present invention will be rendered by reference tothe appended drawings. It is appreciated that these drawings depict onlytypical embodiments of the invention and are therefore not to beconsidered limiting of its scope. The invention will be described andexplained with additional specificity and detail through the use of theaccompanying drawings in which:

FIG. 1 is a block diagram illustrating a tenant migrator that is used tomove services across different clusters.

FIG. 2 is illustrates a service migration that has service downtime andrequires DNS reprogramming.

FIG. 3 is illustrates a service migration that has service downtime butthat preserves the service's IP address.

FIG. 4 is illustrates a service migration that eliminates the servicedowntime and preserves the service's IP address.

FIG. 5 illustrates an example of a suitable computing and networkingenvironment for tenant migration.

DETAILED DESCRIPTION

FIG. 1 is a block diagram illustrating a tenant migrator 11 that is usedto move services across different clusters 12, 13. The tenant migrator11 connects to all clusters in a datacenter. Once a datacenter operatordecides to move a service between clusters, for example, to balanceutilization or to meet tenant demands, the tenant migrator 11 identifiesthe right destination cluster for the service. Selection of thedestination cluster may be based on factors such as utilization of thepotential destination clusters, current demands made by the service,etc. Once a destination cluster is identified, tenant migrator 11 movesthe services by creating/deleting instances on VMs 14, 15 on theoriginal and new cluster.

The tenant migrator 11 controls whether the migration is performed withor without downtime as selected by the operator. The tenant migrator 11may request an update to DNS records if a new IP address is assigned toa service, or it may move an IP address to new cluster if the service ismaintaining the same address. The service presence is mutually exclusiveduring migration. For example, when a service is migrated, tenantmigrator 11 ensures that two instances of the service are never bothrunning from the customer perspective.

FIG. 2 is illustrates a service migration that has service downtime andrequires DNS reprogramming according to one embodiment. Tenant migrator21 has identified a service running on cluster 22 that is to be moved tocluster 23. The old service is assigned an old IP address on cluster 22.In step 201, tenant migrator 21 identifies and copies service artifacts,such as code, bits, certificates, models, etc. from cluster 22. Usingthese artifacts, a new service is created in step 202 on cluster 23 butthe service is not started.

Tenant migrator 21 directs the new cluster 23 to stage the new servicein step 203. Cluster 23 selects the appropriate nodes and sets up theVMs to run the service in step 204. A new IP address on cluster 23 isassigned to the new service. Cluster 23 does not start the service atthis point. The tenant migrator 21 waits in step 206 for the service tobe staged on the new cluster, which is indicated, for example, in step205.

Once the new service has been staged, tenant migrator 21 stops the oldservice in step 207, and then starts the new service in step 208. Theold service is deleted from cluster 22 in step 209, which opens room forother services running on that cluster to expand or to be added.

The tenant migrator then updates the central DNS record in step 210 sothat the domain name for the service points to the appropriate new IPaddress on cluster 23. The DNS record updates may be performedsimultaneously with steps 207 and 208 while the old service is stoppedand the new service is started.

There is a period between stopping the old service in step 207 andstarting the new service in step 208 when the service will not beavailable to users. Additionally, if users access the service using thedomain name, then there may be additional delay while the DNS recordsare updated from the old IP address to the new IP address for theservice's domain name. Because DNS supports many local cachesdistributed across the Internet, time is required to update all of thesecaches. Once the central DNS records are updated, then local DNS cachesare cleared and updated with the new IP address. Until these updatesoccur, users will be directed to the old cluster 22 which is no longerrunning the service and, therefore, attempts to use the service willfail.

FIG. 3 is illustrates a service migration that has service downtime butthat preserves the service's IP address according to one embodiment.Tenant migrator 31 has identified a service running on cluster 32 thatis to be moved to cluster 33. The old service is assigned an IP addresson cluster 32. In step 301, tenant migrator 31 identifies and copiesservice artifacts, such as code, bits, certificates, models, etc. fromcluster 32. Using these artifacts, a new service is created in step 302on cluster 33 but the service is not started.

Tenant migrator 31 directs the new cluster 33 to stage the new servicein step 303. Cluster 33 selects the appropriate nodes and sets up theVMs to run the service in step 304. Cluster 33 does not start theservice at this point. The tenant migrator 31 waits in step 306 for theservice to be staged on the new cluster, which is indicated, forexample, in step 305.

Once the new service has been staged, tenant migrator 31 stops the oldservice in step 307. In step 308, the IP address for the service isremoved from cluster 32.

The IP address for the service is added to cluster 33 in step 309, andthe new service on cluster 33 is started in step 310.

Finally, the old service is deleted from cluster 32 in step 311, whichopens room for other services running on that cluster to expand or to beadded.

Because the IP address for the service has not changed, the tenantmigrator does not need to update the DNS records as was required in theprocess illustrated in FIG. 2. Accordingly, there is a period betweenstopping the old service in step 307 and starting the new service instep 310 when the service will not be available to users. However, oncethe new service is started, users may still access the service using thedomain name without waiting for any DNS record update delay. Local DNScaches will be accurate because the domain name for the service willstill be associated with the same IP address for the service.

FIG. 4 is illustrates a service migration that eliminates the servicedowntime and preserves the service's IP address according to oneembodiment. Tenant migrator 41 has identified a service running oncluster 42 that is to be moved to cluster 43. The old service isassigned an old IP address on cluster 42. In step 401, tenant migrator41 identifies and copies service artifacts, such as code, bits,certificates, models, etc. from cluster 42. Using these artifacts, a newservice is created in step 402 on cluster 43 but the service is notstarted.

Tenant migrator 41 directs the new cluster 43 to stage the new servicein step 403. Cluster 43 selects the appropriate nodes and sets up theVMs to run the service in step 404. The same IP address is used on bothcluster 42 and cluster 43 for the service. The tenant migrator 41 waitsin step 406 for the service to be staged on the new cluster, which isindicated, for example, in step 405.

Once the new service has been staged, tenant migrator 41 stops part ofthe old service in step 407. Tenant migrator 41 then starts thecorresponding part of the new service in step 408. The network is alsoupdated as necessary in step 408 to connect the started parts of the oldand new service as well as the load balancers and other routingcomponents to allow them to point to the started service across clusters42, 43. Unlike the processes illustrated in FIGS. 2 and 3, only aportion of the service (e.g., a selected number of VM or instances) isstopped in step 407 and then started in step 408. Tenant migrator waitsin step 409 for the part that was started on the new cluster to be readyfor use.

Once the new part is ready in step 409, then the tenant migrator repeats(410) steps 407-409 for a next part of the service. These steps continuein a loop 410 until all of the service has been moved piecemeal from oldcluster 42 to new cluster 43. In one embodiment, one update domain worthof service is moved at a time during each pass through loop 410. Thetenant would be ready to lose the upgrade domain during upgrades to theservice, so those segments can be used to portion the service for theinter-cluster migration.

After all of the parts of the service have been moved in loop 410, theold service is deleted from cluster 42 in step 411.

Because the IP address for the service has not changed, the tenantmigrator does not need to update the DNS records as was done in theprocess illustrated in FIG. 2. There is no period during which theservice is stopped on both clusters. Accordingly, the service willalways be available to users without downtime.

FIG. 5 illustrates an example of a suitable computing and networkingenvironment on which the examples of FIGS. 1-4 may be implemented. Forexample, the tenant migrator 11 and/or VMs 14, 15 may be hosted on oneor more computing systems 500. The computing system environment 500 isonly one example of a suitable computing environment and is not intendedto suggest any limitation as to the scope of use or functionality of theinvention. A plurality of such computing systems 500 may be grouped tosupport clusters 11, 12 in a datacenter, for example. The invention isoperational with numerous other general purpose or special purposecomputing system environments or configurations. Examples of well-knowncomputing systems, environments, and/or configurations that may besuitable for use with the invention include, but are not limited to:personal computers, server computers, hand-held or laptop devices,tablet devices, multiprocessor systems, microprocessor-based systems,set top boxes, programmable consumer electronics, network PCs,minicomputers, mainframe computers, distributed computing environmentsthat include any of the above systems or devices, and the like.

The invention may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. Generally, program modules include routines,programs, objects, components, data structures, and so forth, whichperform particular tasks or implement particular abstract data types.The invention may also be practiced in distributed computingenvironments where tasks are performed by remote processing devices thatare linked through a communications network. In a distributed computingenvironment, program modules may be located in local and/or remotecomputer storage media including memory storage devices.

With reference to FIG. 5, an exemplary system for implementing variousaspects of the invention may include a general purpose computing devicein the form of a computer 500. Components may include, but are notlimited to, various hardware components, such as processing unit 501,data storage 502, such as a system memory, and system bus 503 thatcouples various system components including the data storage 502 to theprocessing unit 501. The system bus 503 may be any of several types ofbus structures including a memory bus or memory controller, a peripheralbus, and a local bus using any of a variety of bus architectures. By wayof example, and not limitation, such architectures include IndustryStandard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus,Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA)local bus, and Peripheral Component Interconnect (PCI) bus also known asMezzanine bus.

The computer 500 typically includes a variety of computer-readable media504. Computer-readable media 504 may be any available media that can beaccessed by the computer 500 and includes both volatile and nonvolatilemedia, and removable and non-removable media, but excludes propagatedsignals. By way of example, and not limitation, computer-readable media504 may comprise computer storage media and communication media.Computer storage media includes volatile and nonvolatile, removable andnon-removable media implemented in any method or technology for storageof information such as computer-readable instructions, data structures,program modules or other data. Computer storage media includes, but isnot limited to, RAM, ROM, EEPROM, flash memory or other memorytechnology, CD-ROM, digital versatile disks (DVD) or other optical diskstorage, magnetic cassettes, magnetic tape, magnetic disk storage orother magnetic storage devices, or any other medium which can be used tostore the desired information and which can accessed by the computer500. Communication media typically embodies computer-readableinstructions, data structures, program modules or other data in amodulated data signal such as a carrier wave or other transportmechanism and includes any information delivery media. The term“modulated data signal” means a signal that has one or more of itscharacteristics set or changed in such a manner as to encode informationin the signal. By way of example, and not limitation, communicationmedia includes wired media such as a wired network or direct-wiredconnection, and wireless media such as acoustic, RF, infrared and otherwireless media. Combinations of the any of the above may also beincluded within the scope of computer-readable media. Computer-readablemedia may be embodied as a computer program product, such as softwarestored on computer storage media.

The data storage or system memory 502 includes computer storage media inthe form of volatile and/or nonvolatile memory such as read only memory(ROM) and random access memory (RAM). A basic input/output system(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 500, such as during start-up, istypically stored in ROM. RAM typically contains data and/or programmodules that are immediately accessible to and/or presently beingoperated on by processing unit 501. By way of example, and notlimitation, data storage 502 holds an operating system, applicationprograms, and other program modules and program data.

Data storage 502 may also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,data storage 502 may be a hard disk drive that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive thatreads from or writes to a removable, nonvolatile magnetic disk, and anoptical disk drive that reads from or writes to a removable, nonvolatileoptical disk such as a CD ROM or other optical media. Otherremovable/non-removable, volatile/nonvolatile computer storage mediathat can be used in the exemplary operating environment include, but arenot limited to, magnetic tape cassettes, flash memory cards, digitalversatile disks, digital video tape, solid state RAM, solid state ROM,and the like. The drives and their associated computer storage media,described above and illustrated in FIG. 5, provide storage ofcomputer-readable instructions, data structures, program modules andother data for the computer 500.

A user may enter commands and information through a user interface 505or other input devices such as a tablet, electronic digitizer, amicrophone, keyboard, and/or pointing device, commonly referred to asmouse, trackball or touch pad. Other input devices may include ajoystick, game pad, satellite dish, scanner, or the like. Additionally,voice inputs, gesture inputs using hands or fingers, or other naturaluser interface (NUI) may also be used with the appropriate inputdevices, such as a microphone, camera, tablet, touch pad, glove, orother sensor. These and other input devices are often connected to theprocessing unit 501 through a user input interface 505 that is coupledto the system bus 503, but may be connected by other interface and busstructures, such as a parallel port, game port or a universal serial bus(USB). A monitor 506 or other type of display device is also connectedto the system bus 503 via an interface, such as a video interface. Themonitor 506 may also be integrated with a touch-screen panel or thelike. Note that the monitor and/or touch screen panel can be physicallycoupled to a housing in which the computing device 500 is incorporated,such as in a tablet-type personal computer. In addition, computers suchas the computing device 500 may also include other peripheral outputdevices such as speakers and printer, which may be connected through anoutput peripheral interface or the like.

The computer 500 may operate in a networked or cloud-computingenvironment using logical connections 507 to one or more remote devices,such as a remote computer. The remote computer may be a personalcomputer, a server, a router, a network PC, a peer device or othercommon network node, and typically includes many or all of the elementsdescribed above relative to the computer 500. The logical connectionsdepicted in FIG. 5 include one or more local area networks (LAN) and oneor more wide area networks (WAN), but may also include other networks.Such networking environments are commonplace in offices, enterprise-widecomputer networks, intranets and the Internet.

When used in a networked or cloud-computing environment, the computer500 may be connected to a public or private network through a networkinterface or adapter 507. In some embodiments, a modem or other meansfor establishing communications over the network. The modem, which maybe internal or external, may be connected to the system bus 503 via thenetwork interface 507 or other appropriate mechanism. A wirelessnetworking component such as comprising an interface and antenna may becoupled through a suitable device such as an access point or peercomputer to a network. In a networked environment, program modulesdepicted relative to the computer 500, or portions thereof, may bestored in the remote memory storage device. It may be appreciated thatthe network connections shown are exemplary and other means ofestablishing a communications link between the computers may be used.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

What is claimed is:
 1. A computer-implemented method, comprising:copying artifacts from a first instance of a service running on a firstcluster in a computing environment on a network, the first clustercomprising a first group of virtual machines to run the first instanceof the service, the first instance of the service comprising a firstplurality of parts in a running state and being associated with an IPaddress; creating a second instance of the service on a second clusterin the computing environment on the network by selecting one or morenodes on the second cluster and configuring a second group of virtualmachines to run the second instance of the service thereon using theartifacts copied from the first instance of the service, the secondinstance of the service comprising a second plurality of parts in anon-running state, each of the second plurality of parts correspondingto one of the first plurality of parts and also being associated withthe IP address; stopping, on the first instance of the service on thefirst cluster, a first part from the first plurality of parts so thatthe first part from the first plurality of parts is in the non-runningstate; starting, on the second instance of the service on the secondcluster, a corresponding part from the second plurality of parts so thatthe corresponding part from the second plurality of parts is in therunning state; updating the network to provide access to only the partsfrom the first and second plurality of parts that are in the runningstate; performing the stopping, starting, and updating steps two or moretimes until all parts from the first plurality of parts is in thenon-running state and all corresponding parts from the second pluralityof parts is in the running state; and deleting the first instance of theservice on the first cluster.
 2. The method of claim 1, furthercomprising: assigning an IP address to the first and second instance ofthe service; and supporting the service simultaneously on both the firstcluster and the second cluster using the IP address.
 3. The method ofclaim 1, further comprising: after creating the second instance of theservice on the second cluster, staging the second instance of theservice so that the second instance of the service is ready to be in therunning state on the second cluster but is not in the running state. 4.The method of claim 1, wherein the copied artifacts comprise one or moreof code, certificates, and models.
 5. The method of claim 1, wherein atenant migrator manages stopping the first part and starting thecorresponding part.
 6. The method of claim 1, wherein the first clusterand the second cluster are located within a datacenter.
 7. A computerprogram product for implementing a method for migrating services from afirst cluster to a second cluster within a datacenter, the computerprogram product comprising one or more non-transitory computer-readablestorage media having stored thereon computer-executable instructionsthat, when executed by one or more processors of a computing system,cause the computing system to perform the method comprising: copyingartifacts for a first instance of a service on the first cluster;creating a second instance of the service on the second cluster;stopping a selected part from a first plurality of parts of the firstinstance of the service on the first cluster; starting a correspondingpart from a second plurality of parts of the second instance of theservice on the second cluster; and deleting the first instance of theservice on the first cluster.
 8. The computer program product of claim7, wherein creating the second instance of the service on the secondcluster includes using the copied artifacts for the first instance ofthe service on the first cluster.
 9. The computer program product ofclaim 7, wherein each of the first plurality of parts of the firstinstance of the service on the first cluster is started, and whereineach of the second plurality of parts of the second instance of theservice on the second cluster is stopped.
 10. The computer programproduct of claim 7, further comprising: performing the stopping andstarting steps two or more times until all selected parts of the firstinstance of the service have been stopped on the first cluster and allcorresponding parts of the second instance of the service have beenstarted on the second cluster.
 11. The computer program product of claim7, further comprising: assigning an IP address to the first instance ofa service on the first cluster and further assigning the IP address tothe second instance of the service on the second cluster; and supportingthe first instance of the service and the second instance of the servicesimultaneously on both the first cluster and the second cluster usingthe IP address.
 12. The computer program product of claim 11, whereinsupporting the first instance of the service and the second instance ofthe service simultaneously on both the first cluster and the secondcluster using the IP address further includes updating the network, loadbalancers, and/or other routing components to point the IP address tothe started selected parts from the first plurality of parts and thestarted corresponding parts from the second plurality of parts.
 13. Asystem having a processor, and memory with computer-executableinstructions embodied thereon that, when executed by the processor,performs a method for migrating services from a first cluster to asecond cluster within a datacenter on a network, the system comprising:a first cluster in a datacenter comprising a first group of virtualmachines configured to run a first instance of a service, the firstinstance of the service configured to include a first plurality of partsin a running state and further configured for associating with an IPaddress; a second cluster in the datacenter comprising a plurality ofvirtual machines; a tenant migrator configured to: identify the firstinstance of the service in the first cluster in the datacenter formoving to the second cluster in the datacenter; copy artifactsassociated with the service from the first cluster to the secondcluster; create a second instance of the service on the second clusterby selecting one or more nodes on the second cluster; configure a secondgroup of virtual machines configured to run the second instance of theservice, the second instance of the service configured to include asecond plurality of parts in a non-running state and further configuredfor associating with the IP address; stop, on the first instance of theservice on the first cluster, a first part from the first plurality ofparts so that the first part from the first plurality of parts is in thenon-running state; start, on the second instance of the service on thesecond cluster, a corresponding part from the second plurality of partsso that the corresponding part from the second plurality of parts is inthe running state; update the network to provide access to only theparts from the first and second plurality of parts that are in therunning state; perform the stop and start steps two or more times untilall parts of the first plurality of parts is in the non-running stateand all corresponding parts of the second plurality of parts is in therunning state; and delete the first instance of the service on the firstcluster.
 14. The system of claim 13, the tenant migrator furtherconfigured to: stage the second instance of the service so that thesecond instance of the service is configured to be changed to therunning state on the second cluster but is not in the running state. 15.The system of claim 13, wherein the artifacts associated with theservice comprise one or more of code, certificates, and models.